Viewing 0 current events matching “hadoop datascience bigdata” by Date.

Sort By: Date Event Name, Location , Default
No events were found.

Viewing 2 past events matching “hadoop datascience bigdata” by Date.

Sort By: Date Event Name, Location , Default
Wednesday
Jun 27, 2012
Hadoop and Data Science Meetup
Cloudability

William Taylor will start out by reviewing news items related to data science and big data. This will be similar to the March meeting, when David Price did something similar, and we had a lively discussion of ethical and technical issues. This was a lot of fun, so we're excited to try it again.

Robert Brehm will present a bibliography of recent journal papers of particular interest to the group.

Wednesday
Jul 25, 2012
Hadoop and Data Science Meetup
Cloudability

This month's meet up will start with discussion of news items related to data science and big data led by William Taylor.

Presentation this month by Temese Szalai.

Title: Asking Questions About Big Data: A Basic How-To For Framing Problems When Working With (Unstructured Text) Data At Scale

Summary: Data is only as valuable as the questions we ask about it. The questions to ask need to be those that yield valuable insights, quantifiable results and whose answers lead to actionable information, i.e., help make a decision or meet the requirements of the people and systems consuming the analysis or output. Identifying good questions to ask and how to proceed with very large data sets is at the very heart of being a data scientist.

When working with large data sets, especially ones that are unstructured or semi-structured text data, asking questions and getting started is not always easy. In fact, it's sometimes the hardest part. Drawing on her experiences working with text data at scale, Temese will talk about strategies and methodologies for approaching this kind of data when doing initial discovery and analysis. She'll also cover some basic tools and techniques that are available and basic best practices. Although unstructured text data is a focus, the talk should be general enough to apply to analyzing other kinds of data as well.

Speaker Bio: Temese Szalai has worked as an industrial computational linguist/taxonomist for 13 years. Presently, she is the founder of Madarka, which leverages semantic analysis of large unstructured corpora for psychographic consumer segmentation.